This design page is here to document the different choices we discussed while designing the CLI alerts. The parent page represents the final version of the design.
Common execution environment in server-side and remote scripts (a)
Ability to upload a new script (b)
Ability to "reference" the script from an external location (c)
Execute scripts as alert notifications (d)
Ability to upload a new script while creating the alert definition/template (e)
Ability to edit the script assigned to the alert (f)
Execute scripts in other parts of the RHQ UI - TBD
As with bundles, the scripts are going to be stored using our content subsystem.
To be able to tap into the content subsystem a resource type is needed. The simplest approach is to leverage the existing rhq-script-plugin and its ScriptServer resource type for the CLI scripts. Another approach would be to add a new "dummy" resource type to the platform plugin called "Server-side Script" that wouldn't be discoverable (either automatically or manually) on the agent side.
Uploading new scripts is going to function similarly to how bundles work. There is going to be one (predefined?) repo (called "Uploaded Scripts", configurable through system properties) and the manually uploaded scripts are going to be added to it as packages with the contents stored in the database. The repo is going to have to be (referentially?) guarded so that the user doesn't accidentally delete it and the rest of the code is always able to refer to it (b).
Apart from that the UI is going to provide a way of using any package associated with the ScriptServer resource type in any repo. This is to support scenarios where the scripts are going to be stored externally and made accessible to RHQ server through some content source (c).
There is a couple of requirements the script store needs to fulfill:
store a number of scripts, each script can be referenced by multiple alert definitions
once some alert fires and executes a script, this scripts is referenced in the alert's history
an update to the script doesn't overwrite the content of the scripts referenced in alert histories. I.e. the alert always runs the latest version of the script but
the older versions are accessible/still referenciable from the alert histories.
Chosen approach: fully leverage existing content subsystem
Why: With the small modifications to the behavior mentioned below we can achieve both goals we're after: the content can be stored remotely and we can have full auditing.
The bundle subsytem deals with a very similar requirements on the storage of the bundles. The simplest approach would therefore be to copy its approach and reimplemnt it for script storage. There is almost a one to one mapping of the bundle subsytem "actors" and script subsystem actors.
Bundle = Script
BundleType = ScriptType (the ScriptType here is just a single instance mapping the Script entity to a resource and package type)
BundleDestination = AlertDefinition
BundleDeployment = Alert
BundleFile = ScriptFile (bundles can have multiple files, whereas scripts just one)
BundleVersion = ScriptVersion (this is actually not needed for scripts because there is 1:1 mapping between a script and a package. We therefore don't need this additional level of indirection).
This high level of similarity can be leveraged to more or less copy the impl. of the bundle storage.
Pros: We have a functioning blueprint
Cons: Code duplication
As seen above, the bundles and script storage have high degree of similarity. We could therefore try and come up with a generic interface to both bundle and script storage, let's call it "local binary storage". This would most probably be the today's bundle storage system from which the script storage would inherit and simplify some things (mainly hide the possibility of there being multiple files per BundleVersion).
Pros: Code reuse, building upon an already tested codebase
Cons: Is this really what we want? Aren't we trying to find similarities where there actually aren't any?
The bundle subsystem builds on top of the content subsystem because of its specific requirements for processing the bundle files and recipes on the serverside as well as for the sake of auditing (being able to say precisely what went out of RHQ). The price for this rigidity is the need to store the bundles in the database (the recent additions allow the ant recipes to download the actual content from an URL but the bundle itself still needs to be stored in the DB I believe).
The content subsystem was originally designed to overcome this need and to allow accessing 3rd party content from 3rd party locations. If we just used the existing content subsystem to store the scripts (i.e. there would just be a package type "script" associated with the ScriptServer resource type and packages thereof), we'd enable more flexible storage of the scripts at the cost "loosening" the referential integrity. The script content would be "pointed to" by a package but the content itself would be "in the hands" of the content source (and thus possibly remote and/or in the hands of a 3rd party).
This actually could work for scripts. We'd leverage the content source to download the content using its own mechanism but as soon as we'd need to reference the content in the alert history, we'd store the content in the database. This logic would require the content source to reliably advertise the package version so that at no instance there would be two package versions with different content (i.e. we'd possibly fail to deliver the "intended" content if someone "swapped in" a different file in the remote datasource and didn't change the version). This way, though, we could achieve the "holy grail" of what we're after - ability to have the content stored remotely and have the auditing sorted.
Pros: flexibility in the content location, little new code needed, possibly usable by provisioning as well.
Cons: the package version storage handling as described above feels a bit hacky and would need careful specification.
If all the above is determined insufficient we have no other option but to sit down and come up with a new way of storing the content and replace the content subsystem with something different. The most important requirement here is that this solution must be reusable by both the script and more importantly the bundle subsystem so that we gain the new flexilibty across all the subsystems that access remote content.
Pros: Now that we know all the reqs and the problems we've had with the existing solution, we can possibly design a better implementation of the content subsystem
Cons: possibly large amount of work, a number of new bugs will no doubt be introduced across different subsystems in RHQ
(a) Current remoting CLI defines an execution environment - predefined variables in the script's scope. To the extent it makes sense the same environment should be available to the scripts being executed on the server.
The following variable are available to the scripts in the remote client:
unlimitedPC - an unlimited page control
pageControl - another page control, not really sure about the difference
exporter - an object used to export the data into a file using the TabularWriter
subject - the currently logged in user
pretty - the pretty printer
scriptUtil - script utility methods
ProxyFactory - provides a proxied view on a resource (i.e. operations exposed as object's methods, etc.)
Assert - assertion utilities
configurationEditor - provides an interactive configuration editor
rhq - provides login and logout methods
Apart from the above variables, all the remote interfaces existing in RHQ are exposed as variables like this:
ResourceManagerRemote interface is present as ResourceManager variable, etc.
Additionally, all the methods of scriptUtil, Assert, configurationEditor and rhq objects are also available as "global" methods that call the appropriate methods on the objects themselves. The ProxyFactory exposes the editPluginConfiguration() and editResourceConfiguration() methods on the resource proxies that provide the interactive editing of the configuration.
In the server environment, the interactive parts of the above don't make sense and thus won't be available in the script scope. The script running on the server is going to be provided with a subject it is authenticated with so providing login and logout methods using the rhq variable doesn't make sense.
On the server, therefore, the following environment will be available in the scripts:
unlimitedPC - an unlimited page control
pageControl - another page control, not really sure about the difference
exporter - an object used to export the data into a file using the TabularWriter
subject - the currently logged in user (i.e. the user executing the script on the server)
pretty - the pretty printer
scriptUtil - script utility methods
ProxyFactory - provides a proxied view on a resource (i.e. operations exposed as object's methods, etc.)
Assert - assertion utilities
With the script storage sorted, the alert notifications suddenly become quite simple.
The alert notification is going to be implemented as a new server-side alert plugin. Upon the firing of the alert, the notification is going to instantiate a new scripting engine and execute the associated script (d). It is necessary to instantiate a new scripting engine for each alert notification to prevent the concurrency issues possible if we tried to reuse a single scripting engine for all alerts. If we find this too expensive during testing, we'll have to switch some locking scheme guarding the access to the engine.
The script is going to be provided with the standard bindings as defined above + the actual alert being triggered under the variable surprisingly called alert.
The alert definitions is going to require a custom UI using which a user will be able to pick the appropriate script from some repo. The UI is also going to provide a way to upload a new script directly within the alert notification definition. Such script will be stored in the predefined Uploaded Scripts repo (e).
For the first iteration, the support for in-line editing of the scripts inside the alert definition isn't planned because we'd have to have some logic around the possibility of two alert defs using the same script overwriting the changes made by the other. The usual workflow for the alert scripts should be to develop -> test in remote CLI -> use in an alert notification, so supporting inline edits could encourage users to make untested edits. This feature, although cool, isn't therefore considered as important (f).
For the script to be able to perform actions in RHQ, it needs to be logged in somehow.
Chosen approach: admin-defined script users
Why: It offers the optimal balance between implementation complexity and functionality. It is also similar to how the usual (*NIX) security works, where each privileged service has a dedicated user with tailored privileges.
This is very inflexible. The script has a free choice of the user but is going to break if the user credentials change, the user is deleted or his privileges change.
Pros: trivial to implement
Cons: inflexible
Trivial but insecure. If we used this approach we'd essentially give each user able to define an alert the full access to the system.
Pros: trivial to implement
Cons: insecure
We'd add a facility to mark certain users as "script users". One of such users would be selected during the alert definition as the user to run the script.
This is much more flexible than the previous approach but introduces a kind of security hole. There are going to be scripts that will require some strong privileges and thus a pretty privileged user will have to exist to run them. If we do not guard who can select what script users then we will allow this strong access to the inventory to anyone who's able to define an alert. This of course could be solved by "annotating" each such user with a set of roles that the users defining the alerts have to be in to be able to select given script user.
Adding these new capabilities can be done either on the core server level, where we could add some stronger referential checks to enforce integrity of the setup, but that wouldn't be architecturally ideal because we'd be adding capabilities to the core server that are only there to support a single (and optional) plugin.
The other possibility is to store this setup as the plugin configuration of the alert sender. By that we loose the referential integrity but keep the architecture clean. It would also be more logical for the user to setup some "infrastructure" (i.e. the needed users) and configure the single plugin to use them than to try and setup a privilege system generically not knowing what it actually is for.
Pros: flexible, admins used to think this way
Cons: additional work needed to make this secure
This is by far the most flexible approach. During the alert definition the user would select the roles that are needed to run the script successfully. The system would then "fake" a user assigned to those roles and inject this faked user into the script. This of course has the same security implications as the admin-defined script users but has the same kind of solution. Each role would have to be annotated with a set of roles the user defining the alert has to be in to be able to select given role.
Another drawback of this method is that our authorization framework requires "real" users to exist to perform the security checks. Because we fake the users here, we'd actually have to persist these faked users and mark them as "hidden". There would have to be a 1:1 mapping between a hidden fake user and an alert. There would also be complications during role deletion, which should not fail even if some hidden users were still assigned to it.
Pros: flexible, admins/users sort of used to think this way
Cons: a lot of additional setup needed for this to work securely
This is a weird one. The CLI alerts are implemented by a plugin and we shouldn't have anything in our data model to serve only a particular and optional plugin. At the same time, we want quite tight integration of the scripts and for example alert history views, from which it should be distinguishable what script has been run by a plugin.
Chosen Approach: ?
Why:
One way of doing this I can see is to add a kind of "notification attachments" to the alert history entry that would be defined by individual notification senders at the time of the alert firing. While creating the alert history entry, each alert sender that ran on that particular alert could add an attachment to it - attachment could be just a configuration object that would be stored along with the history entry, keyed by the alert sender's name. Upon rendering the alert history, we'd have a new column called "Attachments" that would contain "links" called by the individual alert senders' names (for those alert senders that contributed an attachment to the history entry). Upon clicking on such link, the plugin would be asked to interpret the stored configuration and return another configuration object representing the attachment (in GWT we don't support per plugin custom UI elements so we have to use an object that we can render, even though this is not ideal). In case of the cli alert sender, the attachment config would contain the package version id that contains the script that ran for that particular alert and the returned configuration object would then contain a single property with the contents of that script (this indirection is needed because we a) can't store LOBs inside configuration objects (the returned configuration would never be persisted and therefore it is OK for it to contain "large" string) and b) would waste DB space by duplicating the content bits for each fired alert). This of course has the implication of not being able to ever delete a server plugin if it was ever referenced from the alert histories. The plugin could be disabled so that no new alert definitions could be created for it, but that would also prevent the histories from rendering the attachments correctly. We can either call it a known deficiency or detect this situation and temporarily start up the plugin just for that single rendering.
The main drawback of this solution is the fact that the package version remains referentially unprotected from deletion. This is of course impossible without modifying the schema which we try to avoid in this approach which leads us to a bit of a head-22 situation.
Pros: generic solution
Cons: quite a bit of work, complications with disabling and deleting the server plugin, no referential guard preventing the package version from deletion
From the architectural point of view this is a hack because we're adding something into the "core" data model that is only being used by a plugin. On the other hand, this is a quite simple and effective solution to referentially guard the package version from deletion by some other subsystem.
Pros: simple, effective
Cons: a hack
This approach is to an extent a combination of the previous 2 with one major drawback - we'd require more than just the usual CRUD privileges on the database.
Pros: architecturally clean(er)
Cons: the user the RHQDS connects as has to be able to create (possibly alter) tables and create foreign keys